161 results found.
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic English Farsi French German Hindi Japanese Korean Mandarin Russian Spanish Tamil Vietnamese
Availability:
From Owner
License:
LDC
Size:
46 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2003 NIST Language Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari English German Hindi Iranian Persian Japanese Korean Mandarin Chinese Persian Russian Spansih Standard Arabic Tamil Thai Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
66 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Amharic Bosnian Croatian Dari English French Georgian Haitian Hausa Hindi Korean Mandarin Chinese Persian Portuguese Pushto Russian Spanish Turkish Ukrainian Urdu Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
215 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2009 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
640 hoursProduction Status:
Existing-used
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Khmer Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu
Availability:
From Owner
License:
LDC
Size:
950 hoursProduction Status:
Existing-updated
Use:
Language Identification
-
Paper title:Modeling and training strategies for language recognition systems
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Cantonese English French German Gishu Greek Gujarati Hebrew Hindi Indonesian Japanese Korean Mandarin Persian Portuguese Runyankore Russian Spanish Turkish Vietnamese
Availability:
Freely Available
License:
OpenSource
Size:
22.8 GByte Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Speaking rate, information density, and information rate in first-language and second-language speech
-
Paper track:1.10 Bilingual and L2 acquisition and processing/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ann Bradlow | The ALLSSTAR Corpus | /N |
Documentation:
Documentation in English is available to the public (via the project website)
Written
Corpus,
Language Type:
Bilingual
Languages:
English Russian
Availability:
Freely Available
License:
Size:
None Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elena Voita | OpenSibtitles | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Bilingual
Languages:
English Russian
Availability:
Freely Available
License:
Size:
5.17 MByte Production Status:
Existing-used
Use:
Lexicon Creation/Annotation
-
Paper title:Studying Taxonomy Enrichment on Diachronic WordNet Versions
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Irina Nikishina | Taxonomy Enrichment (based on WordNet) | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Albanian Croatian English German Russian Turkish
Availability:
Freely Available
License:
Size:
10 MByte Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
-
Paper title:XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mladen Karan | XHate-999 | /N |
Documentation:
There is an accompanying paper detailing dataset createion as well as a short readme with technical details that accompanies the dataset.
Written
Evaluation Data,
Language Type:
Monolingual
Languages:
Russian
Availability:
Freely Available
License:
Creative Commons Attribution-ShareAlike 4.0 International License
Size:
140 words Production Status:
Newly created-finished
Use:
Lexical semantic change detection
-
Paper title:RuSemShift: a dataset of historical lexical semantic change in Russian
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Julia Rodina | RuSemShift | /N |
Documentation:
https://github.com/juliarodina/RuSemShift




